Transparent Real-Time Monitoring in MPI

نویسندگان

  • Samuel H. Russ
  • Rashid Jean-Baptiste
  • Tangirala Shailendra Krishna Kumar
  • Marion G. Harmon
چکیده

MPI has emerged as a popular way to write architecture–independent parallel programs. By modifying an MPI library and associated MPI run–time environment, transparent extraction of timestamped information is possible. The wall–clock time at which specific MPI communication events begin and end can be recorded, collected, and provided to a central scheduler. The infrastructure to create and collect these events has been implemented and tested, and a future architecture that can use this information is described.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hector : User – Transparent Resource Allocation for MPI

– Hector, a complete job scheduling and parallel run–time environment, is intended to present many features both to parallel and sequential jobs, including dynamic load balancing, checkpointing, near–real–time resource awareness, and transparency to the programmer/user. This describes some recent work on user–transparent enhancements to support load balancing, near–real–time resource awareness,...

متن کامل

μπ: a scalable and transparent system for simulating MPI programs

μπ is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of μπ are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (r...

متن کامل

MPI/FT: Architecture and Taxonomies for Fault-Tolerant, Message-Passing Middleware for Performance-Portable Parallel Computing

MPI has proven effective for parallel applications in situations with neither QoS nor fault handling. Emerging environments motivate fault -tolerant MPI middleware. Environments include space -based, wide -area/web/meta computing, and scalable clusters. MPI/FT , the system described here, trades off sufficient MPI fault coverage against acceptable parallel performance, based on mission requirem...

متن کامل

MPI/FTTM: Architecture and Taxonomies for Fault-Tolerant, Message-Passing Middleware for Performance-Portable Parallel Computing

MPI has proven effective for parallel applications in situations with neither QoS nor fault handling. Emerging environments motivate fault-tolerant MPI middleware. Environments include space-based, wide-area/web/meta computing, and scalable clusters. MPI/FT, the system described here, trades off sufficient MPI fault coverage against acceptable parallel performance, based on mission requirements...

متن کامل

Lightweight monitoring of MPI programs in real time

Current technologies allow efficient data collection by several sensors to determine the overall evaluation of the status of a cluster. However, no previous work of which we are aware analyzes the behavior of the parallel programs themselves in real-time. In this paper, we perform a comparison of different artificial intelligence techniques that can be used to implement a lightweight monitoring...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999